programming4us
           
 
 
SQL Server

How SQL Server FTS Works

- Free product key for windows 10
- Free Product Key for Microsoft office 365
- Malwarebytes Premium 3.7.1 Serial Keys (LifeTime) 2019
12/18/2010 5:21:08 PM
As mentioned previously, in SQL Server 2008, the catalogs are now stored inside the full-text engine. This redesign has resulted in many architectural changes in SQL Server 2008 Full-Text Search.

The two main components of Full-Text Search are as follows:

  • Indexing Extracts the textual content from your data and stores the words or tokens in inverted file indexes

  • Searching Queries these inverted file indexes and returns the rows that match the query

Indexing

The indexing engine connects to your database and extracts the content from the tables you are full-text indexing. It then sends this stream to COM components called filters (or IFilters). These COM components are run in an out-of-process service called the FT Daemon Host. These filters are able to understand the content and can extract text data from them. For example, if you store XML or Word documents in your database, these filters can understand this data or binary data and emit words and/or tokens it finds in there. The filters chosen are the default text ones if you are using char, varchar, or text data types or XML if you are using the xml data type. If you are indexing varbinary documents, the indexing engine reads the document type column and launches the filter corresponding to the value stored in the document type column.

If you are storing Word documents in a varbinary data type column, and in your full-text creating statement you specified a document type column called DocumentType, the contents of this column for that row should be doc, .doc, docx, or .docx.

You can obtain a list of filters in use by querying as follows:

select document_type from sys.fulltext_document_types

Each filter understands the file format of the type of document it indexes. For example, the Word filter understands the file formats for Word documents and emits the textual data it finds in the Word documents; the XML filter understands the XML documents and emits the textual data it finds in them.

If you need to index documents for which the file type does not appear in the results of sys.full_text_document_types, you need to install that filter on the server running SQL Server 2008 and then allow SQL Server 2008 to use them.

To allow SQL Server to use these third-party iFilters, you need to issue the following command:

sp_FullText_Service 'load_os_resources',1

This command loads the filter if it is installed on the OS. In most cases, this is sufficient. In many cases, SQL Server wants to verify the signature/certificate embedded in the COM component/filter. This can cause problems in two ways. First, the filter may not have a certificate, and when SQL Server tries to validate the certificate with the issuing authority, it is unable to do so. Second, the performance impact of having to validate the certificate/signature causes the initial queries to take a long time as the validation process proceeds. For these two reasons, you might want to disable the certificate/signature check by using the following command:

sp_FullText_Service 'verify_signature',0

Microsoft has published documentation on how to develop your own filters. For more information on how to do this, consult

http://msdn2.microsoft.com/en-us/library/ms916793.aspx

The filters then send the stream of textual data emitted by them to another component called word breakers. Word breakers respect the language you specified to be used to index your columns’ content.

The neutral word breaker basically breaks words at whitespace boundaries and at punctuation (, . : ; ’ “ ! -) and indexes only alphanumeric characters.

The English (U.S) and British (or International English) word breakers index hyphenated words without the hyphens and as their component words, so data-base is indexed as data, base, and database. They also index acronyms as single letters and the whole word if they are capitalized. For example, F.B.I. is indexed as f, b, i, and fbi (words are indexed lowercase).

The English and British English word breakers are nearly identical, with the exception that during the searching process, different stems may be used. In U.S. English speakers may say oriented, whereas British English speakers may say orientated (in Canada oriented is now more common; however in the rest of the English-speaking world—with the exception of the United States—orientated is more common).

The German and Dutch word breakers index compound words as the compound and constituent words. For example, the German word Volkswagen is indexed as volks and wagen.

For Far Eastern languages, the word breakers break the sentence at whitespace and then go through the “words” and extract characters. In some Far Eastern languages, characters appear contiguous to each other in blocks that appear to Westerners as words. In fact, each character is a word unto itself, and characters can be combined to form new words. These characters may be indexed singly or in multiple character combinations.

By default, the word breaker used by the indexing process is the language specified in sp_configure unless you specify that you want the contents of the columns you are full-text indexing to be indexed in a different language:

exec sp_configure 'show advanced options',1
reconfigure with override
exec sp_configure 'default full-text language'

Some documents have language-specific tags in them that launch different word breakers than the ones you specify on your server or in your full-text index creation statement. For example, Word and XML documents have language tags embedded in them. If your Word documents are in German, and you specify in your full-text index creation statement to use the French word breakers, your Word document are indexed in German, not French.

When the word breakers have done their work, the stop lists are applied and the stop lists are removed. Then the words are sent to the full-text indexes. The full-text indexes store positional information, so they know where a word occurs in a document. These word positions also reflect stop list words that were removed.

At any one time, there may be multiple temporary memory resident full-text indexes. At certain periods, these temporary full-text indexes are consolidated into a single master full-text index. This process is called a master merge. You can force a master merge by reorganizing a catalog (using the T-SQL statement ALTER FULLTEXT CATALOG MyCatalog REORGANIZE, where your catalog is name MyCatalog) or optimizing (an option available to you in the Catalog Properties dialog).

Searching

Although the indexer launches word breakers and filters as out-of-process SQL Server components, the search process is entirely within the SQL Server engine. To query the full-text indexes, you need to use CONTAINS or FREETEXT predicates or their rowset analogs (CONTAINSTABLE, FREETEXTTABLE).

Just as the indexer applies the default server full-text language for indexing, it also applies the default full-text language for searching. Consider a search on the French word courir (to run). If you were to search in English on this word, it would search on courir and courirs. However, on a server with the default full-text language setting for French, your search would be conducted on couraient, courais, courait, courant, coure, courent, coures, courez, couriez, courions, courir, courons, courra, courrai, courraient, courrais, courrait, courras, courrez, courriez, courrions, courrons, courront, cours, court, couru, courue, courues, courumes, courumes, coururent, courus, courusse, courussent, courusses, courussiez, courussions, courut, courutes.

Now that you understand the architecture of Full-Text Search, let’s discuss how to create full-text catalogs.

Note

The 2005 version of the AdventureWorks database can be installed using the same installer that installs the AdventureWorks2008 or AdventureWorks2008R2 database. If you didn’t install AdventureWorks when you installed either of these sample databases, simply relaunch the installer and choose to install the AdventureWorks OLTP database.

Other -----------------
- SQL Azure : Connecting to a SQL Azure Database (part 2) - Connecting from the Entity Framework
- SQL Azure : Connecting to a SQL Azure Database (part 1) - Connecting Using ADO.NET
- SQL Azure : Creating Databases, Logins, and Users (part 2)
- SQL Azure : Creating Databases, Logins, and Users (part 1)
- SQL Azure : Azure Server Administration (part 3) - Databases
- SQL Azure : Azure Server Administration (part 2) - Firewall Settings
- SQL Azure : Azure Server Administration (part 1) - Server Information
- SQL Azure : Managing Your Azure Projects
- SQL Azure : Creating Your Azure Account
- An OLAP Requirements Example: CompSales International (part 16) - Security and Roles
- An OLAP Requirements Example: CompSales International (part 15) - SSIS
- An OLAP Requirements Example: CompSales International (part 14) - Data Mining
- An OLAP Requirements Example: CompSales International (part 13) - Cube Perspectives
- An OLAP Requirements Example: CompSales International (part 12) - Generating a Relational Database
- An OLAP Requirements Example: CompSales International (part 11)
- An OLAP Requirements Example: CompSales International (part 10)
- An OLAP Requirements Example: CompSales International (part 9) - Browsing Data in the Cube
- An OLAP Requirements Example: CompSales International (part 8) - Aggregating Data Within the Cube
- An OLAP Requirements Example: CompSales International (part 7) - Building and Deploying the Cube
- An OLAP Requirements Example: CompSales International (part 6) - Creating the Cube
 
 
 
Top 10
 
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 2) - Wireframes,Legends
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Finding containers and lists in Visio (part 1) - Swimlanes
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Formatting and sizing lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Adding shapes to lists
- Microsoft Visio 2013 : Adding Structure to Your Diagrams - Sizing containers
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 3) - The Other Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 2) - The Data Properties of a Control
- Microsoft Access 2010 : Control Properties and Why to Use Them (part 1) - The Format Properties of a Control
- Microsoft Access 2010 : Form Properties and Why Should You Use Them - Working with the Properties Window
- Microsoft Visio 2013 : Using the Organization Chart Wizard with new data
- First look: Apple Watch

- 3 Tips for Maintaining Your Cell Phone Battery (part 1)

- 3 Tips for Maintaining Your Cell Phone Battery (part 2)
programming4us programming4us